Unsupervised Text Annotation
نویسندگان
چکیده
We introduce the unsupervised text annotation model UTA, which iteratively populates a document-specific database containing the related symbolic content description. The model identifies the most related documents using the text of documents and the symbolic content description. UTA extends the database of one document with data from related documents without ignoring the precision.
منابع مشابه
Evaluating Term-Expansion for Unsupervised Image Annotation
Automatic image annotation (AIA) deals with the problem of automatically providing images with labels/keywords that describe their visual content. Unsupervised AIA methods are often preferred because they can annotate (virtually) any possible concept to images and do not require labeled data as their supervised counterparts. Unsupervised AIA methods use a reference collection of images with ass...
متن کاملImage-Text Dataset Generation for Image Annotation and Retrieval
This paper presents a new dataset of images gathered from the Web with corresponding text obtained from the webpages near where the images appeared. Already extracted features are provided to ease the dataset usage for other researchers. An initial release of 250,000 images is targeted at automatic image annotation with unsupervised data. This dataset is the one being used for the ImageCLEF 201...
متن کاملNews Image Annotation on a Large Parallel Text-image Corpus
In this paper, we present a multimodal parallel text-image corpus, and propose an image annotation method that exploits the textual information associated with images. Our corpus contains news articles composed of a text, images and image captions, and is significantly larger than the other news corpora proposed in image annotation papers (27,041 articles and 42,568 captionned images). In our e...
متن کاملPainless Labeling with Application to Text Mining
Labeled data is not readily available for many natural language domains, and it typically requires expensive human effort with considerable domain knowledge to produce a set of labeled data. In this paper, we propose a simple unsupervised system that helps us create a labeled resource for categorical data (e.g., a document set) using only fifteen minutes of human input. We utilize the labeled r...
متن کاملOn Automatic Annotation of Images with Latent Space Models
Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text an...
متن کامل